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Abstract — The aim of this paper is to show that spatial coupling 
can be viewed not only as a means to build better graphical 
models, but also as a tool to better understand uncoupled models. 
The starting point is the observation that some asymptotic 
properties of graphical models are easier to prove in the case 
of spatial coupling. In such cases, one can then use the so-called 
interpolation method to transfer results known for the spatially 
coupled case to the uncoupled one. 

Our main application of this framework is to LDPC codes, 
. where we use interpolation to show that the average entropy of 
the codeword conditioned on the observation is asymptotically 
the same for spatially coupled as for uncoupled ensembles. We 
use this fact to prove the so-called Maxwell conjecture for a large 
class of ensembles. 

In a first paper last year, we have successfully implemented 
this strategy for the case of LDPC ensembles where the variable 
, node degree distribution is Poisson. In the current paper we now 
. show how to treat the practically more relevant case of general 

■ variable degree distributions. In particular, regular ensembles 
' fall within this framework. As we will see, a number of technical 

difficulties appear when compared to the simpler case of Poisson- 
distributed degrees. For our arguments to hold we need symmetry 

' to be present. For coding, this symmetry follows from the channel 
symmetry; for general graphical models the required symmetry 

, is called Nishimori symmetry. 

I. Introduction 

Spatially coupled codes were introduced in |[T] under the 
] name of convolutional LDPC codes. It was recently proved 
' in IJl that spatial coupling can be used as a paradigm to 
" build graphical models on which belief -propagation algorithms 
■■ perform essentially optimally. The list of applications of this 
^paradigm has expanded in the past years, to include coding 
' and compressed sensing, to name two of the most important 
' ones (see ^ for a review of history and references). But 
spatial coupling can also become useful in a different way: as 
] a theoretical tool that improves understanding of uncoupled 

■ systems. More specifically, sometimes it is much easier to 
prove that (i) a property of a graphical model holds under 
spatial coupling than for the uncoupled version. If that is the 
case, and if (ii) the coupled and the uncoupled scenarios are 
equivalent with respect to that property, then we obtain a proof 
that the uncoupled graphical system has the said property. 

In this paper we prove a statement of type (ii) in the case of 
LDPC codes. Namely, we prove that the conditional entropy 
in the infinite blocklength limit is the same for the coupled 
and uncoupled versions of the code. This enables us to derive 
the equality of the MAP thresholds for coupled and uncoupled 
codes and allows us to conclude that the Maxwell Conjecture 
12] (a result of type (i), which we already know holds for 



coupled ensembles) also holds for uncoupled systems. Our 
treatment is general enough to provide a recipe for similar 
results for many types of graphical models that exhibit so- 
called Nishimori symmetry (of which channel symmetry is a 
special case). 

Our proof succeeds by using the interpolation method, 
which was introduced in statistical physics by Guerra and 
Toninelli for the Sherrington-Kirkpatrick spin glasses |21 and 
gradually found its way to constraint satisfaction problems 
ISj-lTl and coding theory JS], ||9l. The version we use here 
employs a discrete interpolation between the coupled and 
two versions of the uncoupled scenarios. An error- tolerating 
version of the superadditivity lemma is also borrowed from 
Bayati et al. Q to show that the conditional entropy has a 
limit for large blocklengths (called thermodynamic limit in 
physics terminology). 

The purpose of this paper is to extend the proof of 
concept presented at ISIT 2012 JlO) to arbitrary variable- 
node degree distributions. The technique presented there was 
only amenable to ensembles with Poisson-distributed degrees, 
whose range of applicability in coding is limited. This is due 
to the occurence of nodes of very small degrees in significant 
proportions, which limits the performance. In what follows, we 
remove this technical barrier and allow a wide choice of degree 
distributions, including regular graphs. However, we keep the 
restrictions (see lilOi ) that the check node degrees have to be 
even and that the channel must be symmetric. The core of the 
proof rests on the interplay of symmetry and evenness. 

II. Preliminaries 
A. Simple ensembles 

We start by describing a simple ensemble of codes, which 
we call LDPC(A^, A, isT), where N is the number of variable 
nodes, A(a;) = Xld>o ^dx"^ is the variable-node degree dis- 
tribution, and the integer K is the fixed check-node degree. 
The distribution A must be supported on a finite subset of the 
positive integers. The average with respect to this distribution 
will be denoted by d. In case A is supported on a single 
value, we will call the ensemble regular. Next, for each of 
the N variable nodes, the target degree is drawn i.i.d. from 
A, and each variable node is labeled with that many sockets. 
The purpose of a socket is to receive at most one edge from 
a check node, and all edges must be connected to sockets on 
the variable-node side. The number of sockets will thus be a 
random variable which concentrates around Nd. 



The check nodes and the connections are placed in the 
following way: As long as there are at least K free sockets 
(initially all sockets are free), add one new check node con- 
nected to K free sockets chosen uniformly at random, without 
replacement. The chosen sockets then become occupied. The 
final number of check nodes that are added is exactly \D j K\ , 
where D is the total number of sockets. Note that there will be 
at most K — 1 unconnected sockets at the end of this process, 
so the resulting variable node degrees will not in general 
match the target degrees. However, we will be interested in the 
limit N oo, where the distribution of the resulting degrees 
matches A. 

B. Coupled ensembles 

Intuitively, a coupled ensemble LDPC(A^, L, w, A, A') con- 
sists of a number L of copies of a simple ensemble, with 
interaction between copies allowed, in the sense that a check 
node can be connected to nodes in neighboring copies. More 
precisely, the variable nodes are distributed into L groups, 
which lie on a closed circular chain. The positions are indexed 
by integers modulo L, and we employ the set of representatives 
{1, . . . , i}. It will be useful later to also consider open-ended 
chains (this is in fact the case for spatial coupling used in 
applications). 

Just as for simple ensembles, each node is assigned a 
number of sockets drawn i.i.d. from the distribution A. The 
check nodes, however, are restricted in the following way: 
they are only allowed to connect to sockets whose positions 
lie inside an interval (called window) of length w somewhere 
on the chain, i.e. there exists a position z such that all edges 
are connected to nodes at positions z, z +1, . . . , z + w — 1. As 
before, check nodes have degree K, and they are sampled 
as follows: first a window is picked uniformly at random, 
then for each edge, a position uniformly and i.i.d. inside that 
window, and then uniformly a free socket at that position. 
In case there are no free sockets in the chosen position, the 
process is stopped. Note that it is possible to stop with a lot 
of empty sockets in the chain: for example in a very unlucky 
case, the same position might be picked all the time. However, 
with high probabiUty, only a small number of sockets will be 
free at the end of the process, and it is easy to see that in the 
limit where N ^ (x the rate of the code only depends on d 
and K. The steps in this process will be described in more 
detail in Section [Vl 

Note that the ensembles described so far are built in two 
stages: first the vertex degrees are sampled from the distribu- 
tion A and sockets are attached, obtaining the configuration 
pattern; in the second stage, the check nodes and edges are 
connected within this configuration pattern. Both stages are 
random, except when the degree distribution A is regular, 
in which case the first stage is deterministic. It will be 
sometimes helpful to separate the two stages and start where 
the configuration pattern is already given. 

This is a good place to observe that the cases where w = 1 
and w = L yield instances of the single ensemble in the 
following ways: for w = \, there are L different, non- 



interacting copies of LDPC(A^, A, A'), whereas for w = L, 
the whole ensemble is equivalent to LDPC(iVA, A, A'), up to 
a small number of missing check nodes. 

C. Graphical notation 

Traditionally, the Tanner graph is pictured as a bipartite 
graph, with edges linking the variable nodes to the check 
nodes. Here we will consider an equivalent rendering, namely 
as a hypergraph, where the variable nodes are the only nodes, 
and check nodes correspond to AT-ary hyperedges, i.e. K- 
tuples of variable nodes. 

The check nodes have fixed even degree K, and we think 
of them as vectors a = (ai, . . . , a/^-) of variable nodes, 
thereby incorporating the edge information in the graph. A 
code is then specified by the totality of check node connections 
corresponding to its Tanner graph. Thus, abusing a bit the 
standard terminology, we will say that a graph G is just a set 
of check constraints of the type a = (ai, . . . , qk)- In general 
we will use the letters a, b, c, . . . to describe check constraints, 
u, V, . . . to describe variable nodes, and G, G G", ... to describe 
graphs. 

D. Transmission over channel 

We use these codes to transmit over a binary memoryless 
symmetric channel PY\x{y\x)^ where the input symbol set is 
X = {+1, —1}. For just one use of the channel, it is enough 
to consider the half-log-likelihood-ratios (HLLR) h{y) instead 
of the actual outputs y, since they form a sufficient statistic. 
They are defined (bit-wise) as 
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and one can recover the posterior probability that the bit x 
was sent. The latter is easily seen to be proportional to e'*'^^^^. 

We now consider sending the whole input vector, which will 
be denoted usually by cr G , where V is the set of variable 
nodes. Instead of the outputs, we use the HLLRs h G MX , 
given by hy — h{yy), where y is the output vector. 

The posterior probability that the codeword a was sent is 
proportional to e'* '^, where h ■ a stands for X]we\/ hv^^v The 
full expression for the posterior probability, is given by 



nagG(l + ^a)/2 
Z(G) 



(1) 



where CTq is short for the product (Ja^ ■ ■ ■ o^aK^ ™d ^{G) is a 
normalizing factor, also called partition function, given by 



Z{G) = 



One can easily check that the product naeG(-'^ ~^ <^a)l'^ is 
1 when cr is any codeword, and otherwise. Also, we have 
denoted this probability measure by /i in order to distinguish 
it from other randomized parameters that appear, notably the 
channel and the randomness in the graph G. Note that /x 
depends on both G and the HLLRs h, and when this is not 
clear we will make it explicit by adding G or /i as a subscript: 



/iG,/i, Z{G, h). This measure is akin to the Gibbs measure in 
Statistical Physics, and we will call it as such, in order to 
distinguish it from other measures. 

The average with respect to the measure /i will appear quite 
often in the rest of the paper, so we use angular brackets (to be 
referred as Gibbs brackets) (•) to indicate it. In other words. 

Regarding notation, the same subscript conventions apply as 
for ^ apply for the bracket. 

Because of channel symmetry, we will assume without 
loss of generality that the all-+l codeword is sent. Thus all 
randomness in the channel is captured by the i.i.d. HLLRs 
hy,v G V . Note that the channel is independent of the 
code ensemble. When taking expectations with respect to 
the channel we will use the symbol E/j, and whenever the 
expectation is with respect to an ensemble of graphs we will 
use Ecig;, where Q denotes the ensemble of graphs. Note that 
in fact the two expectations commute and we are not required 
to use different symbols, but we still do in order to make the 
source of randomness apparent. It is important to remember 
that the averages denoted by E and the bracketed average do 
not commute. In the language of Statistical Physics, the graph 
and the channel are said to be quenched. 

There is a deep and useful connection between In Z{G) and 
the conditional entropy H{X\Y) (where X is the input vector 
and Y the output vector). We would like to express our results 
in terms of the latter, which carries more information-theoretic 
intuition, but we find it more natural to work with the former 
The link is captured by the following widely known result (for 
the proof, see, for example, ifTTI '). 

Lemma 1. For an arbitrary code of block length N repre- 
sented by a graph G, we have 

H{X\Y) = E,JnZ(G',/i) - 7VE,J/i] . 

The meaning is that the two quantities are the same, up to a 
term that only depends on the channel (and not on the code). 

III. Outline of the results 

A. Comparison of entropies 

We will set up the machinery of the interpolation method 
and direct it at proving the following theorem (for the proof, 
see Section rVlIII i. which states that the entropies of the simple 
and coupled ensembles are asymptotically the same in the 
large N limit. 

Theorem 2. Let L, w, K be integers such that L > w > 1 and 
K is even and K be a degree distribution with finite support. 
Then for a fixed BMS channel we have 

lim ^Eg:ldpc(jv,a.a')^(-'^I^) = 

= lim YM^G-.u:iPG{N,L,w,k,K)H{X\Y). (2) 



Consider a smooth family of BMS channels ordered by 
degradation, indexed by a noise parameter Without loss of 
generality, we may assume that the parameter e is the channel 
entropy H{Yi\Xi), which varies between (the perfect chan- 
nel) and 1 (the useless channel). Then there exists a value emap 
(called MAP threshold) such that for channel parameters below 
this value, the scaled average conditional entropy (quantities 
of the kind appearing on both sides of (|2]i) converges to zero 
in the infinite block length limit, while above this value it is 
positive. 

More formally, for the two kinds of LDPC ensembles, we 
define the MAP threshold in the following manner: 

emap = inf |e : liminf ■^'&G:1.-dvc{n,k.k)H{X\Y) > ol , 

[_ N—>oo iV V . / J 

e^-fp = inf (e : liminf -i-E ldpc H{X\Y) > o] . 

These definitions employ liminf and are meaningful even 
when the existence of limits is not guaranteed. However, in our 
case, the existence of limits is part of the result of Theorem 
|2] and we could have replaced liminf by lim. The theorem 
obviously implies the equality of the two MAP thresholds. 

Corollary 3. With the same assumptions as in Theorem^ we 

J L.w 

have €map = ^map- 

B. The proof of the Maxwell Conjecture 

As an application of this, we will prove the Maxwell 
conjecture for a large class of degree distributions. Let us 
recall the statement of the conjecture. Let CArea be the area 
threshold defined as that value so that the integral of the BP- 
GEXIT curve over the interval [eArea, 1] equals the design rate 
1 — d/K (for more details, see ||3|). The Maxwell conjecture 
states that eAi-ea = cmap- 

The following was recently proved in [2 |. For a large class 
of LDPC ensembles, if we consider the corresponding coupled 
ensemble, then the BP threshold (and hence, by threshold 
saturation, the MAP threshold) is very well approximated by 
EArea (of the simple ensemble) in the following sense: 

eA.ea - O(^) < 4a";'°'™ < ^A.ea + 0{j). (3) 

The threshold eMAP°'"^" '■^^ °f open coupled 
chain, which is constructed such that the positions on the 
chain are from {!,..., L}, but the windows do not "wrap 
around". Instead we add ghost variable nodes at positions 
—w + 2, . . . , — 1, and i + l,...,L + t(7 — 1, whose input 
bits will always be fixed to +1. The windows are of the form 
{z, . . . , z + ui — 1}, where z = —w + 2 . . . ,L. 

The only difference in the average conditional entropy of 
the open and closed chains comes from the check nodes that 
lie at the boundary of the chain. The proportion of these check- 
nodes is 0{w/L). By an application of the second statement in 
Lemma|5] the difference of the entropies is at most 0{w/L), 
which goes to as i ^ oo. As a consequence, 

hm e^^p = hm e^^^p. 



Thus by (O and Corollary |3] we deduce that in fact cmap 
equals CAi-ea, by first taking the limit L oo and then w oo. 
This completes the proof of the Maxwell conjecture for all 
those LDPC ensembles for which (O is known. 

C. Proof of the equality of the MAP- and the BP-GEXIT 
curves above the MAP threshold 

In the rest of this section we will only work with uncoupled 
systems, so the ensemble over which we average is always 
LDPC(iV, A, K). Also, in order to make clear that the channel 
output depends on the channel entropy parameter e, we will 
write the former as Y{e). The MAP-GEXIT function ^map is 
defined as 



5MAp(e) 



limsup^ 

N-ioo JV 



i/(X„|y^„(e)) 



(4) 



where ^ v represents the set of all nodes except v. Equivalently 
ifTIl Section III], the MAP-GEXIT function also takes the form 



5MAp(e) 



lim sup ^ Eg 



§;H{X\Y{e)) 



(5) 



The latter formulation can then be employed to lower bound 
the area below guAP above the MAP threshold as follows; 

»i 



gMAp(e)de 

cmap 

/•I / . 

lim sup 

(a) ^ /•! 

> lim sup 
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eMAP 
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-H{X\Y{e)) 



de 



de 



: lim sup 



KgH{X\Y{1)) - -¥.GH{X\Y{eMA?)) 



R 



liminf-EGi/(X|y(eMAp)) 

N^OD iV 



> R, 



(6) 



where in step (a) we used the Reverse Fatou Lemma (note 
that the integrand on the rh.s. is bounded, see for example 
dUi), and in step (b), since at e = 1 the channel is completely 
useless, we have that H{X\Y{1)) = H{X) and jjEG[H{X)] 
is the rate of the code in the large blocklength limit. For step 
(c), by the definition of the MAP threshold for any e < eMAP, 
we have liminfAr_>oo ■^EGH{X\Y{e)) = 0, and we will now 
argue for left-continuity at eMAP- The conditional entropy term 
can be expanded using 

H{X\Y{e)) = H{X) + H{Y{e)\X) - H{Y{e)). (7) 

In the large blocklength limit, the first two terms on the rh.s. 
are related to the rate and the one-bit channel entropy, i.e. 



lim ^EG[H{X) + H{Y{e)\X)] 



R 



for any entropy parameter e £ [0, 1]. Then by taking limits in 
O, we get 



lim inf 

Af— s-oo 



N 



= R 



e — lim sup 



N 



(8) 



We now show that the l.h.s. as a function of e is left- 
continuous at eMAP- Suppose it is not; then clearly it must 
have a positive jump at eMAP- According to there is 
a negative jump in limsupjY_j.oo "^EG-ff(y(e))- Because 
H{Y{e)) is an increasing function of e for any graph, 
the contradiction becomes evident. Thus, we obtain that 
liminfAT^oo jjEGH{X\Y{eMAp)) = and we conclude the 
discussion of inequality 

We now define the BP-GEXIT curve in a similar way, except 
that now in the calculation of the "extrinsic" entropy of bit X^ 
we only consider the information in the computation tree of 
depth £ rather than all of Denoting the subset of outputs 
occurring in the computation tree of length i by (see lfT2l 
Section III]), we can formalize the previous statement as 



.9Bp(e) = lim lim sup Eg 



^iJ(X.lyije)) 



(9) 



We can easily see then by the data processing inequality 
that (see also Lemma 9 in lfT2l ') 



gMAp(e) < 5Bp(e), for all e G [0, 1] 



(10) 



The area threshold mentioned before is defined as the 
solution eaiea to the equation 



5Bp(e)de = R. 



(11) 



Using then the equality of the MAP and area tresholds 
established in the previous subsection for the above-mentioned 
class of LDPC codes and using (|6|l and ( fTTT i we obtain 

/ (5Bp(e) - .9MAp(e))de < i? - i? = 0. (12) 

The positivity of the integrand (cf. ( fTOl i) entails the equality 
of the two curves almost everywhere above the MAP thresh- 
old. 

IV. Some useful lemmas 

We present in this section two results that are quite general 
in nature, meaning that they are true for any linear code. They 
already appear in ||8], lfT3l . but we reproduce short proofs here 
in order to make the paper self-contained. The symmetry of the 
channel is a property that seems indispensable for the proofs 
in the rest of this paper, and we will need it in the form of 
the Nishimori Identity. 

Lemma 4 (Nishimori Identity). Fix a graph G (it can be 
general, no constraints on the check node degrees needed 
here). For any odd positive integer m we have 



E^[(ab)'"] = E,, (a,) 



, m+l 



(13) 



where b = (bi, . . . ,bj) is a vector of variables (which need 
not be interpreted as a check constraint) of arbitrary length, 
and CFb ~ ffci ■ ■ ■ o'bj- The channel used for transmission needs 
to be BMS, symmetry being the crucial ingredient. 



Proof: It is easy to see that the symmetry of the channel 
impHes a property also called symmetry of the HLLRs hy (as 
distributions), expressible as PY\x{hy \ + 1) = PY\x{~hv\ + 
Vje^^^ (by a slight abuse of notation, PY\x{hv\ + 1) is 
the HLLR distribution). Then there exists a density function 
/ with nonnegative support such that PY\x{hy\ + 1) = 
f{\hy\)e^^. Using the memoryless property of the channel, 
the l.h.s. of ( fTSl ) can be written as 

E/.[(^6>'"] = / (at)'" n e''-f{\K\)dK. (14) 

We now observe that due to channel symmetry the above 
quantity is preserved under the transformation (called gauge 
transformation in Physics) /i„ h^Ty, cjy i— > CT^r„, if r 
is a codeword. As a matter of fact, the transformed HLLRs 
hyTu are those received when the codeword t was transmitted, 
instead of the all-+l codeword. 

We now perform an average over all codewords r, obtaining 

1 ^r- 



\C(G)\ ^ 



where C{G) is the set of all codewords. 

Note that the Gibbs bracket above averages over <t, and 
thus we can safely take Tf, out of the bracket. Since m is 
odd, T™ ~ Tft. Next we use the definition of Gibbs measure 
(equation ([T]i) to replace X)rGC(G) ^'^'^'^'b with Z{G) (ti,). We 
obtain 

Mi'^b)"'] = I Z{G) {a,r+^ n fi\hv\)dK- 

(15) 



\C{G)\ 



Expanding Z(G, /i) into ^ 



AeC(G) 



e'''^ we get 



A second gauge transformation i-> h^Xy, i— )■ ct^Ai, 
allows us to cancel all A factors, since A^, = 1. All |C(G)| 
terms in the sum are equal, so the expression simplifies to 

^h\{obT\ = f n ^''-filhvDdK, (16) 

vev 

and thus the claim follows. ■ 
The next result quantifies the effect on In Z of one extra 
check node added to some general linear code. This is the 
main reason why we chose to work with In Z instead of the 
conditional entropy. 

Lemma 5. Given any graph G and an additional check 
constraint b, we have that 



,[lnZ(GU6)-lnZ(G)] = -ln2+ ^ 



Efe[(o'b)G] 



In particular, - In 2 < In Z(G U 5) - In Z(G) < 0. 



The second part of the statement shows that the contribution 
of one extra check node gives only a finite variation in In Z, 
and it turns out to be very useful for the cases where we 
need to show that two similar ensembles have log-partition 
functions that are asymptotically identical. 

Proof: Using the definition of the partition function 
Z{G U b), we are able to write 



Z{GUb)^ 



Ji-' 



■l+CTb 



Then lnZ(G U fo) - InZ(G) = -ln2 + ln(l + {a^)) 
Expanding the logarithm into power series, we obtain 



ln(l + (tTb)) = 



3 



(17) 



We now use the Nishimori Identities (Lemma lU with 



Eft 



for even j. This allows us to 
with the following term, proving 



merge each odd-index term 
the claim. ■ 
Let us now analyze the terms of the form (o'b)^ that appear 
in the last lemma. For this purpose, we will work with the 
product measure /i®*". The measure space here is the one of r- 
tuples (cr'-'^' , • ■ • , cr''"^), where cr'-'^ G . Because the product 
measure is just the measure of r independent copies of the 
measure (henceforth called replicas), it is easy to check that 



\(^b)G = 



G 



The ®r sign at the top right of the bracket is just to remind 
us that we deal with the product measure /x®''. Since this is 
evident from context, we will drop this sign in the future. We 
are then able to restate the last lemma as follows. 

Corollary 6. Given any graph G and an additional check 
constraint b, we have that 



Eft[lnZ(GUfe) -lnZ(G)] 
= -In2 + Eft E 

r62Z+ 

V. The configuration model 



(18) 



In this section we introduce the language needed to describe 
and dissect all the kinds of ensembles that we need. We 
assume that the configuration pattern introduced in Section 
III-BI is already fixed, i.e., it has been properly sampled at an 
earlier stage, and there are at least Nd{l — N~'^) and at most 
Nd{l + N^"^) sockets at every position. By a straightforward 
application of a Azuma-Hoeffding type of inequality and 
the union bound for all positions, this happens with high 
probabilitjQ in the first stage, as long as < 77 < j. The 
fixed underlying configuration pattern is always of the coupled 
kind, i.e., there are L groups of variable nodes each; 



'By with high probability we mean that the event in question happens 
with probability 1 — o(l/poly(Af)). The parameters L and to are considered 
constant for this purpose 



the simple kind will arise from the conditions w ~ I and 
w ^ L. Given the fixed configuration pattern, each variable 
node V has a target degree d{v), and exactly d{v) sockets 
numbered from 1 to d{v). Given a socket s, let var(s) denote 
the variable node that it is part of; by (t,, we understand 
o'var(s)- Let pos(f) denote the position of the variable v, 
with the notation extending to sockets in the obvious manner: 
pos(s) = pos(var(s)). We also set S to be the set of all 
sockets and put Sz = {s G S : pos(s) = z}, i.e. the set of 
sockets at a particular position. 

Check nodes will connect to sockets, so a check node a 
will have the form of a /\ -tuple {ai, . . . jQk), where the 
components Uj are sockets. Note that the ordering of the 
edges leaving the check-node matters, so the check also 
"stores" this information. We say that a check node a has 
type a — (ai, . . . , ax) if ctj ~ pos(aj), for all I < j < K. 
In other words, the type records the positions of the variable 
nodes to which the check node a connects. 

We now consider random types, of which there are three 
kinds that are important to us: 

• The connected random type. This random type is 
uniformly distributed over the set of all possible 
types. We denote this distribution by conn. 

• The disconnected random type. This type is uniformly 
distributed over the set of all types whose entries are all 
equal, i.e., types of the form (z, z, . . . , z). We denote this 
distribution by disc. 

• The coupled random type. We choose a position z 
uniformly at random and the result is a type uniformly 
distributed over the set of all types whose entries lie in 
the set{z,...z + w — 1}. We denote this distribution by 
coup. 

We now define the positional occupation vector ocCq of a 
type a to be a vector whose z entry counts the number of 
occurences of position z in type a. As an example, if K = 
6 and a = (1,3,2,5,1,3) and assuming there are i = 5 
positions, then ocCq = (2, 1, 2, 0, 1). 

Given a multiset of types F (a set of types where duplicates 
can appear), we extend the definition of the positional occu- 
pation vector to occr = X^aer '^'^*^ct- ^^^^ ^ multiset of 
types m-admissible if occr(2) < \Sz\ — m, for all positions z. 
In other words, an m-admissible set of types F ensures that 
there exists a graph G whose check constraints match one-to- 
one the types in F (we say that G is compatible with F), and 
in addition, there are at least m sockets at each position that 
remain free. We will also use the word admissible to mean 
0-admissible. One should think about the multiset of types as 
being a kind of "pre-graph", where only the positions of the 
edges are decided, but not yet the actual sockets. 

The random graph generated by an admissible multiset of 
types F is simply given by the uniform measure over all graphs 
that are compatible with F. To sample this random graph, 
the algorithm is as follows: start with the empty graph; for 
each type a ~ (ai, . . . , a/^) in the multiset F (the order 
is immaterial), pick distinct ai uniformly at random from 
the free sockets at position ai, and add check constraint 



(ai, . . . , uk) to the graph. We will use this check-generating 
procedure often, so we will say that check constraint a is 
chosen according to distribution v{a, G) that depends on the 
type a, and the part G of the graph that is already in place. Let 
Ba be the set of check constraints that are compatible with a 
and are connected to free sockets (sockets that do not appear 
in G). Note that a socket must never be used twice, so they 
are chosen without replacement. Then v{a,G) is the uniform 
measure on Ba- 

We also trivially extend this definition to the case of a ran- 
dom graph generated by a random multiset of types. This latter 
random object will be typically a list of independent random 
types of one of the three kinds connected, disconnected and 
coupled. For the sake of precision, in case the multiset of types 
is not admissible (by this we mean m-admissible, where m 
will be fixed later), we define the generated random graph to 
be the empty one. 

We now introduce a quantity inspired from Statistical 
Physics that plays an important role in what comes next, 
namely the positional overlap functions. Fix a configuration 
graph G, a channel realization h, and the number r of replicas 
of the measure /iG,ft- Let Fz C Sz be the set of free sockets 
at position z (free sockets being those that do not appear in 
any check constraint of G). The positional overlap functions 
Qz, indexed by a position z, are defined by 

The next statement describes the link between the overlap 
functions and the replica averages introduced by Lemma |5] 

Lemma 7. Given a number m > K^, a fixed channel 
realization, a fixed graph G whose associated type set is m- 
admissible and fixed type a, we have 

= ^nQ„>(^),...,aW)^ + o(l). (20) 

Proof: The left hand side is nothing else than the average 
over all possible a that are compatible with the type a and 
connect to free sockets. In other words, 

^E(-i^^—i^^>- (21) 

The goal is to somehow factorize the sum, but the fact that 
sockets are not replaced makes it a bit harder. Suppose that, 
contrary to our current model, free sockets are allowed to be 
chosen with replacement, that is, it is possible to have = aj 
for i ^ j. Let B'^ be the set of all (pseudo-)check constraints 
that are compatible with a, and where sockets are allowed to 
appear multiple times. Then B'^ can be written as a product: 

B'^ = Fa^ X ... X Fai^ , 



where the set Fz is the set of free sockets at position z. The 
idea is now that we can replace with B'^ in the average (ISTT i 
without losing too much, while gaining the ability to factorize 
the sum. 

The relation between the two, which is proven in the 
Appendix lAl is 



\BJ ^ \ ° 



aeBa 



IS' I ^ \ I m 



aS-B' 



(22) 



Now we are in a better position, since on the rh.s. any entry 
Qi is chosen independently of the others. We rewrite the sum 
over B'^ in the following way: 




Taking the bracket outside and factorizing, we obtain 



L_ y „w . . . ^(oY . .( y . . . U 



which we can identify as the bracketed product of positional 
overlap functions on the right hand side of (l20l i. ■ 

Lemma 8. Let G be a graph whose type multiset is in- 
admissible, and fix the channel realization h. Then the fol- 
lowing inequalities hold: 



Eaiconn (cr(l) •■•O-M ) > 
a:iy(a,G)\ " " I G ~ 



> E a:coup 

a:i'{a,G) 



(aW...aM)^ + 0(l/m), 

a:coup /cr(l) ■••crM\ > 
aMa,G)\ IG 

a:disc +0(l/m). 



%:iy{a,G) 
> E 



(23) 



(24) 



Proof: The claim follows by Lemma Q if we manage to 
show the following two inequalities: 

^a:conn {Qai ' ' ' Q a k) ^ ^^^arcoup {Qai ' ' ' Qcxk )i (25) 
-I^aicoup {Qai ' ' ' Q a k) ^ l^^Q:disc {Qai ' ' ' Qcxk ); (26) 

where the dependence of the positional overlap functions on 
the spin systems cr'^^ has been dropped in order to lighten 
notation. 



We rewrite the quantities above as follows: 

{ai,. ..,aK) \\ ^e[L] 

z'e[L] (ai,...,QK') 

e{z' ,...,z'+w~i}"^ 

\ z'e[L] \ z^z' 
IEa:disc {Qai ' ' ' Qolk) ^ 



(27) 




(28) 



(29) 



zelL] 



Both inequalities ( |25] ) and ( |26] | are proved by an application 
of Jensen's Inequality using the convexity of the function 
X i-> , for even K. ■ 

VI. The interpolation 

We now move a bit further and consider random ensembles 
of graphs. These are obtained in the following way: first 
we prescribe the numbers of random types of each kind 
that we want, i.e. how many types should be connected, 
disconnected and coupled. Afterwards, the random types are 
sampled according to the distributions prescribed. Finally the 
graph is chosen uniformly to match the multiset of types, in 
the spirit of the previous section. 

We use the notation G : {*/,xdisc } say that G is sampled 
in the way outlined above, where ti and t2 are the number 
of random types of the coupled kind and disconnected kind, 
respectively. Of course, we could specify any combination of 
the three kinds, conn included. 

Now we need to set the number of check nodes in the 
ensemble. There are two conflicting constraints we would like 
to satisfy: first, the set of types needs to be admissible with 
high probability — so that the sampled graph exists in the 
form we want; second, the number of free sockets that remain 
should be small, in the sense that the proportion of free sockets 
needs to vanish in the limit. 

The average amount of check nodes needed to use all 
available sockets is (ideally) NLd/K. However, there is a 
fluctuation {±N^~'^d at each position) of the amount of 
available sockets and it might not be possible to connect actual 
check nodes to all sockets (for example, because of window 
constraints). As a consequence, we choose the actual size of 
the graph (by this we mean the number of multi-edges, i.e. 
check nodes) to be T = NLdil — N~^)/K, so in case 
the graph is admissible there will be 0{N^^'^) free sockets 
left at each position. The exponent 7 is arbitrary, as long as 
< 7 < The next lemma confirms that by using this 



value for T, the resulting set of types is admissible with high 
probability. 

Lemma 9. Let a^, . . . , be random types, each drawn from 
a distribution that is either conn, disc or coup (could be 
different for each type). Then with high probability (more 
precisely 1 — 0(exp(— kA^^~^'^)), for some positive constant 
k) the resulting multiset of types is dN^^'^ /2-admissible. 

Proof: The plan is the following: fix a position z, and 
show that the number of appearances of z as entries of 
a^,...,a'^ exceeds TK/L + dN^ ^ /2 with a very small 
probability. Next, by the union bound over all positions z, we 
upper bound the probability that the graph is not dN^~'' /2- 
admissible and the lemma is proved. 

We concentrate on the above claim, and define Xt to be 
the number of entries in a* equal to z, for I < t < T. 
Clearly the Xt are independent, bounded and their expectation 
equals K/L (the choice of distribution of a* is immaterial 
as long as it is one of conn, disc or coup). Then by 
Hoeffding's Inequality, the probability that ^ Xt deviates 
from its expectation TK/L decays very fast. More exactly. 



T 

E 

t=i 



Xt > 



TK 
IT 



2 



< exp 



(30) 



which proves the claim. ■ 
The previous lemma essentially allows us to take the expec- 
tation over an ensemble of graphs without caring too much 
about non-admissibility. We are now ready to prove a key 
result, expressed as the following lemma. 

Lemma 10. The following two inequalities hold: 

E/i,G:{Txconn}lll2'(G) < 

< E„,G:{Txcoup}ln Z{G) + O {N^) , (31) 

Eh..G:{Txcoup}ln2'(G) < 

< E„,G:{Txdisc}ln Z{G) + O (TV^) . (32) 

Proof: We only discuss the first of the two inequalities, 
since the proof of the other is identical. We will set up 
a chain of inequalities, at the ends of which sit the two 
quantities that we need to compare. This is the main idea 
of the interpolation method: finding a sequence of objects 
that transition "smoothly" between two objects that can differ 
significantly. In our case, it is easily seen that the claim follows 
if we are able to show that 



l^(T— t-l)xcoupJ 

<E^^; txconn .In Z (G) + O (N^-^) . (33) 

'•■^'^■t(T-t)xcoup/ 

The two ensembles involved in inequality (IJTT i lie at the 
endpoints of a chain of T inequalities of the form above, with 
t moving from to T — 1. The crucial observation here is that 

.1 . i_i 1 (f+l)xconn 1 it ixconn 1 

the two ensembles |(T-t-i)xcoup/ ™d |(T-t)xcoup| can 
both be obtained by sampling a graph G from their common 



part, { 



(T-t-l)xcoup 



} and in case G is not null, adding an 



extra random check constraint sampled according to conn and 
coup, respectively. The plan is to show that the inequality (l33T l 
holds also when G is fixed, and then to average over G. 

Let us fix TO = dN^~'^ /2, and let us first deal with the 
case when the realization of the ensemble {(T-t-^xcoup} 
is not m-admissible. This event occurs with a very small 
probability, subexponential according to Lemma |9] Since 
InZ(G) = 0{N) (according to Lemma |5]l, the error obtained 
by not considering this case is extremely small and fits in the 
tolerated term O { f^l—, )- 

Otherwise, G is such that there are at least to free sockets 
at every position, and we need to show that 

E/iE a:conn In Z{G U o) < E^E a:coup In Z{G U a). 

We substract InZ(G) on both sides and then use Lemma|5] 
to write the difference of log partition functions as a linear 
combination of brackets of the form (cr'^^^ ■••cr^''')-, after 
which we can readily apply Lemma |8] and the claim follows. 



VII. Retrieving the original LDPC ensembles 

We will now investigate further the connection between 
the ensembles {T x conn} and {T x disc}. In fact, they 
are both variants of the uncoupled ensembles introduced in 
the beginning of Section [III The first one is very similar 
to LDPC(A^L, A, i<r), and the second one is similar to L 
copies of LDPC(A^, A, i^). The only differences that occur 
are related to the case where there is a large deviation in the 
number of sockets generated in the first stage, or when the 
multisets of types generated by {T x conn} and {T x disc} 
are not admissible. Also since the first stage of the ensemble 
generation, where we obtain the configuration pattern, is the 
same in all cases, we condition on the event that the con- 
figuration pattern is known and that it satisfies the condition 
stated at the beginning of Section [Vl namely that the number 
of sockets at each position is Nd/L\ ± 0{N^). 

We can easily see that the ensemble {T x disc}, condi- 
tioned on the fact that its realization is admissible, can be 
extended to L copies of the simple ensemble on N variable 
nodes by adding 0{N^~'^) extra check constraints. Thus the 
scaled log partition function is the same up to a sublinear term. 

Can we say the same about the ensemble {T x conn} 
and the simple ensemble on NL variable nodes? Yes, but it 
requires a lengthier argument. Let us look closer at the latter. 
This ensemble is not generated using types (since positions 
play no role here), but we can still count the occurrences of 
various types that appear in it. There are exactly L^ different 
types, and the next proposition estimates the probability that 
a particular random check constraint in the simple ensemble 
LDPC{NL,A,K) has a certain type. To see the crux of the 
problem, in the {T x conn} ensemble, the types are gener- 
ated uniformly. Whereas in the simple ensemble, a position 
with considerably more occupied sockets than other positions 
has a lesser chance to be picked. 



We will proceed by transforming the ensemble 
LDPC(7VL, A, A') (the simple ensemble) into {T x conn} 
(the connected ensemble) through only a small amount of 
check additions and deletions. Let be the number of 
check nodes of type a that occur in a realization of the 
simple ensemble. For every type a, let be a random 
variable sampled according to Biii(r, i^^). If Xa > Ya, 
then exactly Xa — Ya check nodes of type a selected 
uniformly at random from the existing ones are deleted from 
the simple ensemble. Otherwise, exactly Ya ~ Xa check 
nodes of type a are chosen uniformly at random from all 
possible combinations of compatible free sockets and inserted 
in the graph without replacement. All insertions of check 
nodes must occur after all deletions have been performed (the 
order of the types is important). If at any stage there are no 
free sockets at a particular position to choose from, it just 
means the underlying multiset of types (which here is given 
by the numbers Ya) is not T-admissible, and we produce the 
trivial code. 

In order to bound the number of check node insertions 
and deletions, we compute the first and second moments of 
Xa — Ya ■ The total number of check nodes M in the simple 
ensemble is fixed for our purposes (depends only on the 
configuration pattern), so we can write Xa ~ J2a where 
Ra is the indicator random variable of the event that check 
node a has type a, and the sum ranges over all M check 
nodes. 

Proposition 11. The expectation and variance of Xa — Ya 
are given by 



E[X„ - Ya] = 0{N^-^), 
Var[X„ - Ya] = ©(iV^-"). 



(34) 
(35) 



Proof: We determine first the probability Ei?° that a 
check node a has type a. This event happens if and only if all 
sockets a; to which a is connected are placed at positions a^. 
For this, we need to evaluate the proportion of free sockets 
at each position (all sockets are free initially, because w.l.o.g. 
we can say that a is the first check node to be allocated). The 
number of sockets at any position is between Nd{\ — N^^) 
and Nd{l + N^^); the number of occupied sockets is at 
most K — 1 (from previous edges). Thus, the probability that 
pos(aj) = Qfj is lower-bounded by 



Nd{l - N-'^) - K _ 1 
NLd{l + N-1) ^ L 
and, likewise, upper-bounded by 

Ndil + N-'^) _ 1 
NLd{l - N-n) ^ L 
It then follows that 

K 



+ 0{N-'^). 



(36) 



For the second moments we need E 
probability that a and b have types a and /? at 



RaRp 



, i.e. the 

the same 



time. The reasoning is essentially similar to the previous case, 
only now there are 2K edges to connect and at most 2K — 1 
occupied sockets (by symmetry we can arrange that a and b 
are the first two check nodes to be allocated). Then we have 



l[RaR] 



0{N- 



2K 



12K 



0{N-'^). (37) 



By summing over all check nodes, we get KXa = 
jT: + 0{N^^''') and after elementary calculations, VarATo, = 
0(iV^^''). Since Ya is binomially distributed, and using 

T = M + OiN^-"^), we have 

T hf 



and also 



Vary, = ("l _ _L ) ^ 0{N), 



L' 



which is much smaller than VarATc,. ■ 
To show that the amount of inserted and deleted check nodes 
is small, we employ now the Chebyshev Inequality, which, for 
some value of the parameter ( to be fixed shortly, reads 



\Xa -Ya-O {N^-'^) I > N'^O (n'^-^ 



< 



1 



We fix the values C = f ™d 7 



^ (these choices are 



somewhat arbitrary), and simplifying we obtain 



\Xa 



Ya\ >0 



< N' 



Using the union bound over all possible types, the 
bound on the probability that the number of insertions and 
deletions is sublinear in the way depicted above remains 
O (Ar-''/2). In case the the number of insertions and deletions 
is too large, we use the 0{N) we use the fact that \nZ{G) 
is always 0{N) (see Lemma |5]l. This proves the following 
lemma. 

Lemma 12. Transmitting over a BMS channel, we have 



E, 



/i,G:LDPC(AfL,A,if )ln Z (G) 



> E 



h,G:{Txc 



,jlnZ(G) + o(iVi-^) 



VIII. The large N limit 

This section wraps up the proof of Theorem |2] The main 
ingredient is the content of Lemma [TO] which can be written 

as 

E„,G:{Txconn}lnZ(G) - O {N'-') < 

< E;i,G:{Txcoup}ln^(G) < 

< IE/.,G:{Txdisc}ln Z{G) + O (N^-^) . (38) 

Using the results from the previous section on the compari- 
son with the simple ensembles and scaling everything by NL, 



we obtain 



NL 



IE,.,G:LDPC(iVL,A,K)ln^(G) - O {N^--^) < 

< -^Eft,G:{Txcoup}ln2'(G) < 

< ^IE/.,G:LDPC(iv,A,if)ln^(G) + O {N'-^) 



(39) 



The next step is to take the ^ oo limit, and in case 
it exists for the outer terms, which we are about to show, 
we can apply the "sandwich rule" to obtain Theorem |2] Note 
that the ensemble appearing in the middle is what we call 
LDPC{N, L, w, A, K) — we are of course not obliged to pick 
it as such: we could do another level of processing in the style 
of the previous section; however the current form is known to 
fulfill the Maxwell conjecture, so we need not go any further 

To show that the limit 



lim ^Eh^G:LDPC(N,A,K)^nZ{G) 

A*— )-oo iv 



exists, we use the following result, whose proof can be found 
in the Appendix of Q. 

Lemma 13 (The modified superadditivity theorem). Given 
a e (0, 1), suppose a non-negative sequence {ajv,7v>i} 
satisfies 



aNi+N2 > a-Ni + - 0{{Ni + iV2)") 



(40) 



/or every Ni,N2 > 1. Then the limit limjv^oo ^ exists (it 
may be +ooj. 

The claim then follows by setting the sequence ajv to be 
the negative of the sequence we study (since InZ(G) are 
negative). It remains to be shown that superadditivity indeed 
holds. 

Since this part is a somewhat simpler variation of the 
interpolation we have already seen, we only present the proof 
sketch. We consider a coupled ensemble consisting of only 
two positions(i = 2) and interpolate between the cases w = 1 
(disconnected case) and w = 2 (connected case). The novelty 
is that the number of variables at the first and second positions 
differ, they are A^i and N2, respectively. For the connected 
case, when edges from check nodes are connected, we do not 
pick the position at random, but rather weigh the choice by 
^1 = NT+m_ ^2 = jv^%^ , respectively. 

The only difference appears in the reasoning of Lemma |8] 
where the types are not uniformly distributed anymore. The 
types are now binary strings of length K, with the two symbols 
appearing denoting the position, one having weight vi, the 
other V2. The weight of the type is the product of the weights 
of the symbols it contains. If a is a type, let v{a) be the 
weight of that type. Then Equations dZTl i and ( |29] ) become 



ze{i,2} 

and clearly the lemma remains true in this case as well. 

IX. Conclusions 

The present analysis can be extended with almost no change 
to arbitrary check-node degree distributions whose generating 
polynomial P{x) = J2k<o Pk^^ is convex for x e [—1, 1]. 
Experimental evidence suggests that even this condition can 
be relaxed, but new ideas seem to be required for the proofs. A 
possible route would be to show self-averaging properties for 
overlap functions, which would allow to use the convexity of 
X I— >• P{x) for X > 0, which holds for any degree distributions 
(see 191 for a related approach). 

The idea of using spatial coupling as a proof technique 
potentially goes beyond coding theory. We can use it to 
analyze the free energy of general spin glass models and find 
exact characterizations or bounds on their phase transition 
thresholds. We plan to come back to this problem in a 
forthcoming publication. 

Finally, let us also mention that recently, algorithmic lower 
bounds to thresholds of constraint-satisfaction problems were 
derived by comparing simple and spatially-coupled constraint- 
satisfaction models (see lfT4l '). 
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Appendix 

Proposition 14. Given a fixed configuration graph G whose 
underlying type set is m-admissible for m > and a fixed 
channel realisation h, then with the notation from the proof 
of Lemma [7| we have that 



\BJ ^ \ 



aeBa 



IB' I ^. \ " 



aeB' 



O 



(41) 



Proof: Rewrite the left hand side as 



\B'\ |B„ 



We will first find an estimate of the quantity \B'^ \ Ba\, i.e. 
the number of (pseudo-)check constraints that connect to at 
least one socket multiple times. To do this, let us look at the 
subset of B'^ where = Uj (i.e. edges i and j connect to the 
same socket), for some distinct i,j with I < i,j < K. The 



cardinality qi j of this subset is if a; 7^ aj, and is equal to 
\B'J/\F,\ < \B'J/m if a,^a,. 

A (rough) upper bound for \B'^ \ Ba \ is given then by sum 
^i.j' which in turn never exceeds K'^\B'^\/m. 

We are now able to bound the ratio | B'^ \ / \ Ba \ appearing in 
( |42] | by m/{m — K^). Indeed, this follows from 

\B^\ m-\B'^\B^Y 
The absolute value of the second sum in (l42l l is clearly 
upper-bounded by \B'^ \ Ba\, since the bracket takes values 
between and 1. Putting everything together, we obtain 



< 



aeB' 



\Ba\ ^ 



> 



\B' 



m - 

I \ a a 



\B'J ^, \ " 



r) 
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